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Abstract 

Background: Dynamic Bayesian network (DBN) is among the mainstream approaches for modeling various 
biological networks, including the gene regulatory network (GRN). Most current methods for learning DBN employ 
either local search such as hill-climbing, or a meta stochastic global optimization framework such as genetic algorithm 
or simulated annealing, which are only able to locate sub-optimal solutions. Further, current DBN applications have 
essentially been limited to small sized networks. 

Results: To overcome the above difficulties, we introduce here a deterministic global optimization based DBN 
approach for reverse engineering genetic networks from time course gene expression data. For such DBN models 
that consist only of inter time slice arcs, we show that there exists a polynomial time algorithm for learning the 
globally optimal network structure. The proposed approach, named GlobalMIT + , employs the recently proposed 
information theoretic scoring metric named mutual information test (MIT). GlobalMIT + is able to learn high-order time 
delayed genetic interactions, which are common to most biological systems. Evaluation of the approach using both 
synthetic and real data sets, including a 733 cyanobacterial gene expression data set, shows significantly improved 
performance over other techniques. 

Conclusions: Our studies demonstrate that deterministic global optimization approaches can infer large scale 
genetic networks. 



Background 

Gene regulatory network (GRN) reverse-engineering has 
been a subject of intensive study within the systems biol- 
ogy community during the last decade. Of the dozens 
of methods available currently, most can be broadly 
classified into three main-stream categories, namely co- 
expression network, differential equation and Bayesian 
network. Co-expression network [1,2] is a class of coarse- 
scale, simplistic models that relies directly on pairwise 
or low-order conditional pairwise association measures, 
such as the (partial) correlation or (conditional) mutual 
information, for inferring the connectivities between 
genes. These methods have the advantage of low com- 
putational complexity, and can scale up to very large 
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networks of thousands of genes [3]. However, their major 
limitation is that they do not model the network dynam- 
ics, and hence cannot perform prediction. Differential 
equation (DE) based approaches are a class of sophis- 
ticated, well established methods which have long been 
used for modeling biochemical phenomena, including 
GRNs [4,5]. A particularly salient feature of DE-based 
approaches is that they can accurately model the detailed 
dynamics of biochemical systems in continuous time. 
However, these methods are also much more computa- 
tionally intensive, and so far are only applicable to rela- 
tively small networks of a handful genes (i.e., 5-30). Lying 
in-between these two extremes are Bayesian networks 
(BN), a class of models that are based on solid principles 
of probability and statistics. A BN represents accurately 
and compactly the joint distribution of a set of variables, 
using probability and graph theories. BN can also perform 
prediction of the GRN behavior in unknown conditions, 
albeit not at as detailed level as DE-based approaches. 
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In this paper, we focus on the BN paradigm, which is 
indeed among the first approaches for reverse engineering 
GRN, through the seminal work of Friedman et al. [6,7], 
and later by numerous other authors [8-14]. Two crit- 
ical limitations when applying the traditional static BN 
paradigm to the GRN domain are: (i) BN does not have 
a mechanism for exploiting the temporal aspect of time- 
series data (such as time-series microarray data) abundant 
in this field; and (ii) BN does not allow the modeling 
of cyclic phenomena, such as feedback loops, which are 
prevalent in biological systems [15]. These limitations 
motivated the development of the dynamic Bayesian net- 
work (DBN) which has received significant interest from 
the bioinformatics community [15-22]. DBN exploits the 
temporal aspect of time series data to infer edge direc- 
tions, and also allows the modeling of feedback loops (in 
the form of time delayed cyclic interactions). 

In DBN framework, the task of GRN reverse 
engineering amounts to learning the optimal DBN struc- 
ture from gene expression data. After the structure has 
been reconstructed, a set of conditional probability tables 
can be easily learned, using methods such as maximum 
likelihood, to describe the system dynamics. In this 
paper, we are focusing on the more challenging problem 
of structure learning. Most of the recent works have 
employed either local search (e.g., greedy hill climbing), 
stochastic global optimization (e.g., genetic algorithm, 
simulated annealing), or Monte Carlo simulation. This is 
due to several NP-hardness results for learning static BN 
structure (see e.g., [23]). However recently, Dojer [24] has 
shown otherwise that for certain DBN models, learning 
can be efficiently done in polynomial time for the globally 
optimal DBN, when the Minimum Description Length 
(MDL) and the Bayesian-Dirichlet equivalent (BDe) scor- 
ing metrics are employed. In our recent preliminary work 
[25], we have shown that this result also holds true for the 
Mutual Information Test (MIT), a novel scoring metric 
recently introduced for learning static BN [26]. Through 
extensive experimental evaluation, de Campos [26] sug- 
gested that MIT can compete favorably with Bayesian 
scores, outperform MDL (which is equivalent to the 
Bayesian Information Criterion — BIC) and hence should 
be the score of reference within those based on informa- 
tion theory. To our knowledge, other than the popular 
scoring metrics, MIT has not been considered for learn- 
ing DBN. An attractive characteristic of MIT is that when 
placed into a global optimization framework, its complex- 
ity is much lower than that of the BDe-based algorithm 
by Dojer [24], and seems to be comparable to that of the 
MDL-based algorithm. In other words, MIT seems to 
combine the goodness of both BDe and MDL, namely 
network quality and speed. The implementation of our 
MIT based algorithm, made available as the GlobalMIT 
toolbox [27], when tested on small scale synthetic data 



[25], confirmed that MIT also performs competitively 
with BDe and MDL in terms of network quality. 

The first-order Markov DBN model that we considered 
earlier [25,27] is however not completely adequate for the 
accurate modeling of GRN, as genetic interactions are 
invariably delayed with different time lags [20]. Specifi- 
cally, this delay is due to the time required for the regulator 
gene to express its protein product and the transcription 
of the target gene to be affected (directly or indirectly) 
by this regulator protein. In GRNs, most genetic interac- 
tions are time delayed, depending on the time required for 
the translation, folding, nuclear translocation, turnover 
for the regulatory protein, and elongation of the target 
gene mRNA [28]. Furthermore, the amount of time lag 
needed for different regulator to exert its effect is also 
different. Higher order DBNs are therefore needed to cap- 
ture these time-delayed interactions. In this paper, we 
generalize our GlobalMIT algorithm to the case of higher 
order DBN models, to be named GlobalMIT + . Our con- 
tribution in this paper is three-fold: (i) we prove the 
polynomial time complexity of GlobalMIT + for higher 
order DBNs; (ii) we give a complete characterization of 
the time complexity of GlobalMIT + , and propose a vari- 
ant GlobalMIT' for large scale networks that balances 
optimality, order coverage and computational tractability; 
(iii) we evaluate the high-order GlobalMIT" 1 "/* on several 
real and synthetic datasets, and for the first time apply 
a DBN-based GRN reverse engineering algorithm on a 
large scale network of 733 cyanobacterial genes, in a very 
reasonable run-time on a regular desktop PC. We show 
that the learned networks exhibit a scale-free structure, 
the common topology of many known biochemical net- 
works, with hubs with significantly enriched functionals 
corresponding to major cellular processes. 

Methods 

Preliminaries 

We first briefly review the DBN models. Let X = 
{Xi,...,X n } be a set of random variables (RV); 
{xa,...,Xiif} be an actual observed sequence cor- 
responding to X{ over N time points; X,[£] be the 
RV representing the value of X, at any time t; and 
X[ t] = {Xi [ t] , . . . ,X n [ t] }. A DBN represents the joint 
probability distribution function (PDF) over the set of 
n x N RVs X[ 1] UX[2] . . . U X[N]. Since such a general 
PDF can be arbitrarily complex, several assumptions 
are often employed for its simplification. The two most 
popular assumptions are first-order Markov ianity, i.e., 
P(X[ t] |X[ 1] , . . . ,X[ t - 1] ) = P(X[ t] |X[ t - 1] ), and 
stationarity, i.e., P(X[t] \X[t — 1] ) is independent of t. 
These two assumptions give rise to the popular first- 
order Markov stationary DBN which assumes that both 
the structure of the network and the parameters char- 
acterizing it remain unchanged over time. It is worth 
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noting that recent works have progressed to allow more 
flexible, non-stationary DBN models, such as ones with, 
either parameters [22], or both structure and parameters 
[29] changing over time. However, more flexible models 
generally require more data to be learned accurately. 
In situations where training data are scarce, such as in 
microarray experiments where the data size can be as 
small as a couple of dozen samples, a simpler model such 
as the Markov stationary DBN might be a more suitable 
choice. 

DBN models consist of two parts: the prior network and 
the transition network [30]. The prior network contains 
only intra time slice edges (since there are no other time 
slices preceding it), while the transition network can con- 
tain both inter and intra time slice edges, as demonstrated 
in Figure l(a,b). Learning the prior network requires col- 
lecting m independent observation sequences, of which 
only m initial time slices are used for learning. For biologi- 
cal networks, such data abundance is not always available, 
since there may be only one or a very limited number of 
time series. Therefore, only the learning of the transition 
network is practical and is relevant. Henceforth, by DBN 
we mean only the transition network part of the model. 
Some authors have further restricted the transition net- 
work to contain only inter time slice edges [18,21,24]. In 
the context of genetic networks, inter-time slice edges 
correspond to time-delayed genetic interactions, while 
intra-time slice edges correspond to instantaneous inter- 
actions. In reality, only delayed genetic interactions are 
biologically plausible, as a result of the time required for 
the translation, folding, nuclear translocation, turnover 
time-scales for the regulatory protein, and the time scale 
for elongation of the target gene mRNA [28]. Only when 
this total time lag is small compared to the sampling gap, 
then the interaction can be considered as instantaneous. 
In this paper we shall consider DBN with only inter-time 
slice edges. The rationale for this focus can be taken from 
both a biological point of view (genetic interactions are 
essentially time-delayed), and from an algorithmic point 




(X3[q) 



Figure 1 (a) prior network; (b) First-order Markov transition 
network; (c) 2nd-order Markov transition network with only 
inter time slice edges. 



of view: there are efficient polynomial time algorithms for 
learning this class of DBN, as will be discussed in the next 
section. 

A critical limitation of the first-order DBN for modeling 
GRN is that it assumes every genetic interaction to have 
a uniform time lag of 1 time unit, i.e., all edges are from 
slice [ t — 1] to [ t]. For GRNs this is not always the case, 
since genetic interactions can have longer lags, and differ- 
ent transcription factors (TF) of the same gene can have 
different lags [20]. As mentioned earlier, this motivates the 
use of higher order DBNs, in which the first-order Marko- 
vianity is replaced by the d th -order Markovianity, i.e., 
P(X[ t] |X[ 1] , . . . ,X[ t-1] ) = P(X[ t] |X[ f-1] , . . . , X[ t- 
d] ). With this model, a node (i.e., gene) can have parents 
(i.e., TFs) in any of the previous d time slices. A 2nd-order 
Markov DBN is illustrated in Figure 1(c), in which node Xi 
is regulated by two parents, namely X3 with one-time-unit 
lag, and X\ with two-time-unit lag. 

The MIT scoring metric 

In this section, we first review the MIT scoring metric 
for learning BN and then show how it can be adapted to 
the DBN case. The most popular approaches for learn- 
ing DBN are essentially those that have been adapted 
from the static BN literature, namely the search+ score 
paradigm [15,21], and Markov Chain Monte Carlo 
(MCMC) simulation [18,29]. In this paper we apply the 
search+score approach, in which we specify a scoring 
function to assess the goodness-of-fit of a DBN given the 
data, and a search procedure to find the optimal network 
based on this scoring metric. While several popular scor- 
ing metrics for static BN, such as the Bayesian scores 
(K2, BD, BDe and BDeu), and the information theoretic 
scores (BIC/MDL, Akaike Information Critetion — AIC), 
can be adapted directly for DBNs, we focus on the Mutual 
Information Test (MIT), a recently introduced scoring 
metric for learning BN [26]. Briefly speaking, under MIT 
the goodness-of-fit of a network is measured by the total 
mutual information shared between each node and its 
parents, penalized by a term which quantifies the degree 
of statistical significance of this shared information. To 
understand MIT, let {r\, . . . , r„] be the number of discrete 
states corresponding to our set of RVs X = {X\, . . . ,X n }, 
D denote our data set of N observations, G be a BN, and 
Pa,- = {Xn, . . . ,Xt Si ] be the set of parents of X, in G with 
corresponding [r&, . . ., r, Si } discrete states, and = |Pa,|. 
The MIT score is defined as: 



SS M it(G :D)= 



2N.I(X i ,T>a i )-J2xc l J iaiW 
7=1 



where / (X{, Pa;) is the mutual information between X, and 
its parents as estimated from D. Xa,l r * s the va l ue such 
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that p(x 2 (hj) — Xa.lij} = a (the Chi-square distribution 
at significance level 1 — a), and the term lun(i) is defined as: 



in - l)(r i(Ti (j) - 1) U' k =i r icn{k), 7 = 2. 

in - i)(run(fi ~ i). i = 1 



m(j) 



■ ,Si 



where cr,- = {cr 2 (l), . . . , 07 {si)} is any permutation of the 
index set {1 ... s, } of Pa,-, with the first variable having the 
greatest number of states, the second variable having the 
second largest number of states, and so on. 

To make sense of this criterion, let us first point out 
that maximizing the first term in the MIT score, i.e., 
27V ■ I(Xi, Pa,), can be shown to be equivalent to maxi- 
mizing the log-likelihood criterion. However, learning BN 
by using the maximum likelihood principle suffers from 
overfitting, as the fully-connected network will always 
have the maximum likelihood. Likewise, for the MIT 
criterion, since the mutual information can always be 
increased by including additional variables to the par- 
ent set, i.e., I(Xi, Pa, U Xj) > I(Xi, Pa ; ), the complete net- 
work will have the maximum total mutual information. 
Thus, there is a need to penalize the complexity of 
the learned network. Penalizing the log-likelihood crite- 
rion with-iC(G)log(AT) gives us the BIC/MDL criteria, 
while — C(G) gives us the AIC criterion (where C(G) = 
YH=i^ r i ~ 1) XYjLi r ij measures the network complexity). 
As for the MIT criterion, while the mutual information 
always increases when including additional variables to 
the parent set, the degree of statistical significance of 
this increment become negligible as more and more vari- 
ables are added. This significance degree can be quantified 
based on a classical result in information theory by Kull- 
back [31], which, in this context, can be stated as follows: 
under the hypothesis that Xi and Xj are conditionally inde- 
pendent given Pa, is true, the statistics 27V • /(X,,Xy|Pa;) 
approximates to a / 2 (Z) distribution, with / = (r, — l)(r; — 
l)qi degree of freedom, and qi = 1 if Pa, = 0, otherwise 
qi is the total number of states of Pa ( , i.e., qi = \Yi=i rik ' 
Thus the second term in the MIT score penalizes the addi- 
tion of more variables to the parent set. Roughly speaking, 
only variables that have the conditional mutual informa- 
tion shared with X, given all the other variables in Pa, that 
is higher than 100a percent of the MI values under the null 
hypothesis of independence can increase the score. An 



important difference between MIT and the other informa- 
tion theoretic based metrics (BIC/MDL, AIC) is that the 
penalty term is applied individually and independently to 
each RV rather than to the network as a whole. For further 
details on the motivation and derivation of this scoring 
metric as well as an extensive comparison with BIC/MDL 
and BD, we refer readers to [26]. 

We next show how MIT can be adapted for the case 
of high-order DBN learning, by carefully addressing the 
issue of data alignment. The mutual information is now 
calculated between a parent set and its child at differ- 
ent time lags. At any time t > d, let Pa/ = [Xn[t — 
, . . . ,Xi Si [t — S( Si ] } be the parent set of Xi[t], with 
[8ft, . . . , 8i Si } be the actual regulation order corresponding 
to each parent. In this work, since we only consider DBN 
with inter time slice edges, 1 < <5y < d, V/ for a <f-th order 
DBN. When the mutual information is calculated, the tar- 
get node is always shifted by d units forward in time, while 
the parents are shifted forward by \d — &n, . . . , d — <5 ;Si } 
time units respectively. We define I s as a time-delayed 
mutual information operator, which automatically shifts 
the target variable as well as all of its parents to the correct 
alignment. 

The number of effective observations N e is therefore 
N e = N — d, if we have only one time series of length 
N. If there are m separate time series, it is imperative 
that no wrong alignments occur at the transition between 
these time series when they are concatenated. The num- 
ber of effective observations for multiple time series is 
N e = YllLi Ni — md where N/s are the length of the time 
series. The MIT score for DBN is calculated as: 



S' mT {G:D) = 



2A/ e -/ s (X i ,Pa ! -)-^ 



7=1 



To make this clear, we demonstrate the process of data 
alignment through the simple DBN example given in 
Figure 1(c). For node X 2 , Pa 2 = [Xi [ t - 2] , X 3 [ t - 1] }, 
therefore when I s (-) operates, it shifts the target node 
X2 forward by two units in time, while the parent X\ 
is shifted zero unit, and parent X3 is shifted 1 unit, as 
shown in Figure 2. The number of effective observa- 
tions is N e = N\ — 2 if only the first sequence is used, or 



1 2 N,-2N 1 -1N 1 1 2 N 2 -2N 2 -1N 2 1 2 N 3 -2N 3 -1N 3 




Figure 2 Data alignment for nodeX^ in the DBN in Figure 1(c). Shaded cells denote unused observations for the calculation of hiXj, Pa2). 
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N e = N\ + AT 2 + N 3 — 2 x 3 if all 3 sequences are used for 
learning. 

Shared and exchanged information in time-delayed Ml 

The proposed algorithm uses the time-delayed mutual 
information to give directional sense in dynamical sys- 
tems. As a measure, for capturing system dynamics, the 
time-delayed MI contains both the exchanged informa- 
tion which is useful and the shared information which 
is not useful. However, Schreiber [32] premised that the 
time-delayed MI, because of its use of static probability, is 
limited and unable to distinguish between the exchanged 
information from shared information. Consequently, he 
proposed the concept of transfer entropy, using transi- 
tion probabilities rather than static probabilities, thereby 
ignoring static correlations due to the common history or 
common input signals. From this viewpoint, it implies that 
the transfer entropy would be more appropriate because 
the time-delayed MI, using static probability, will contain 
exchanged information with less 'strength than transfer 
entropy which is not influenced by static correlations. 

However, we note that the transfer entropy requires 
the estimation of very high-dimensional joint distribu- 
tions, i.e., (2d + I) dimensions where d is the Markov 
order. Thus, even with d = 3, hundreds to thousands of 
samples are required for satisfactory estimation of the 
7-dimension distribution. In contrast, the time-delayed 
MI requires estimation of only bi-dimensional distri- 
butions and is thus better able to cope with limited 
(few tens of samples) microarray data samples, as com- 
monly available for reconstructing genetic networks. If 
the number of samples increases in the future, e.g., 
due to advancements in technology for gene expres- 
sion profiling, the transfer entropy approach will be an 
important candidate for reverse engineering genetic net- 
works. 

Proposed approaches 

This section presents our GlobalMIT + algorithm for 
learning the globally optimal structure for a d-th order 
DBN with the MIT scoring metric in polynomial time. 
The original GlobalMIT algorithm for the case of the 
lst-order Markov DBN [25] can be considered as a spe- 
cial case of GlobalMIT + with d = 1. Our development of 
GlobalMIT + has made use of the same set of assumptions 
as proposed by [24]. While therein, the DBN learning 
problem is placed within a generic machine learning con- 
text, herein we are focusing our attention to the particular 
context of GRN modeling. Next, we list the required 
assumptions and discuss the associated rationales along 
with biological plausibility. 

Assumption 1. (acyclicity) Examination of the graph 
acyclicity is not required. 



This assumption is valid for DBNs with no intra time 
slice edges. For this class of DBN, as the edges are only 
directed forward in time, acyclicity is automatically sat- 
isfied. The biological implication of this assumption is 
that we may not be able to detect the instantaneous 
interactions. As stated previously, the majority of genetic 
interactions are time-delayed. However, if the sampling 
gap is large, we may consider some quick interactions 
as instantaneous. The effect of this constraint is that, if 
gene X\ regulates gene X2 almost instantly, their mutual 
information I(Xi,X2) will likely be maximized when their 
expression profiles are in synchrony, i.e., no shifting of any 
of the two sequences is involved. With Assumption 1 in 
place, we will have to consider two time-delayed mutual 
information values, 7 S (^1>^2) and I s (X2,Xi) (since I s is 
asymmetric). If these values are significantly weaker than 
I{Xi,X2) then the interaction between genes X\ and X2 
may go undetected. However, when the signal is smooth 
and is sampled in short time step, we found that shift- 
ing the expression profile by just one time unit will not 
often cause a large reduction in the MI value. This is 
because smooth time series have high auto-correlation 
at short lags, and thus, instantaneous interactions may 
still be captured by DBN models with only inter-time 
slice edges. The algorithmic implication of Assump- 
tion 1 becomes clear when we consider Assumption 2 
below: 

Assumption 2. (additivity) S(G : D) = YH=i s C^i> P a < : 
DlxiiiPni) where -D^uPa; denotes the restriction ofD to the 
values of the members ofXt U Pa,-. 

To simplify notation, we write s(P&i) for s (Xi, Pa, : 
D\xi U Pa/). Assumption 2 simply states that the scoring 
function decomposes over the variables and is satisfied 
by most scoring metrics such as BIC/MDL, BD and also 
clearly by MIT. However together with Assumption 1, 
their algorithmic implication is profound: these assump- 
tions allow us to compute the parent set for each node 
independently. Unlike the case of BN where the choice 
of parents for a certain node may affect the choice of 
parents of all the other nodes, for DBN (without intra 
time slice edges), the choice of parents for a node has 
no effect on the other nodes. Thus, the algorithms devel- 
oped based upon these two assumptions become very 
amenable to parallelization, i.e., each node can be learned 
independently with a separate computational thread. Still, 
exhaustive brute-force search for the optimal parent set 
will require exponential time for a d-th order DBN, 
because Pa, can be an arbitrary subset of X[ t — 1] U . . . U 
X[ t — d] and the number of all possible parent sets 
is 2 dn . In order to further reduce the search space, we 
rely on the special structure of the scoring metric, as 
follows: 
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Assumption 3. (splitting) s(Pa;) = w(Pa;) + v(Pa;) for 
some non-negative functions u and v satisfying Pa; C 
Pa;. w(Paj) < w(Pa;.). 

Assumption 4. (uniformity) |Pa,-| = |Pa^| =>■ »(Pa,-) = 
«(Pa{). 

Assumption 3 requires the scoring function to decom- 
pose into two components: v evaluating the accuracy of 
representing the distribution underlying the data by the 
network, and u measuring its complexity. Furthermore, u 
is required to be a monotonically non-decreasing function 
in the cardinality of Pa, (Assumption 4), i.e., the network 
gets more complex as more variables are added to the par- 
ent sets. However in its original form, the MIT scoring 
metric, having higher scores for better networks, does not 
abide by these assumptions. We overcome this by casting 
the problem as a minimization problem (similar to Dojer) 
where lower scored networks are better. We consider a 
variant of MIT as follows: 

S M it(G : D) = Zti 2N e ' Is&h X d )-S' MIT (G:D), (1) 

where X li =X[t-l]U...UX[(-4 This score admits 
the following decomposition over each variable (with the 
convention of /(X;, 0) = 0): 

smit (Pa;) = v M /r(Pa;) + WM/r(Pa;), (2) 
v M /r(Pa;) = 2N e ■ I s (X h X d ) - 2N e ■ I s (X h Pa;), (3) 
MM /r(Pa;) = TjLiXaJw (4) 

Roughly speaking, vmit measures the "error" of repre- 
senting the joint distribution underlying D by G, while 
umit measures the complexity of this representation. We 
make the following propositions: 

Proposition 1. S' MIT maximization is equivalent to Smit 
minimization. 

Proof. This is obvious, since J2"=i 2We ■ h{Xi,X d ) = 
constant. □ 

Proposition 2. vmit, umit satisfy assumption 3. 

Proof, vmit > 0 since of all possible parent sets Pa;, the 
full set X d has the maximum (shifted) mutual information 
with Xi. And since the support of the Chi-square distri- 
bution is R+, i.e., Xa,- > °> therefore Pa; c Pa^. =>• 0 < 
umitCPsh) < MM/r(Pap. □ 

While we note that umit does not satisfy Assump- 
tion 4, for applications where all the variables have the 
same number of states, it can be shown to satisfy this 
assumption. Within the context of GRN modeling from 
microarray data, this generally holds true, since it is a 
popular practice to discretize expression data of all genes 



to, e.g., 3 states corresponding to high, low and base-line 
expression value [15]. 

Assumption 5. (variable uniformity) All variables in X 
have the same number of discrete states k. 

Proposition 3. Under the assumption of variable unifor- 
mity, umit satisfies assumption 4. 

Proof. It can be seen that if |Pa;| = |Pa^| = s;, then 
w M /r(Pa;) = ww(Pap = £/Li X a ,(k-i) 2 k>- 1 - n 

Since mm/t (Pa;) is the same for all parent sets of 
the same cardinality, we can write Umit{\P^i\) in place 
of MMfr(Pa;). With Assumptions 1-5 satisfied, we can 
employ the following Algorithm 1, named globalMIT" 1 ", to 
find the globally optimal DBN with MIT, i.e., the one with 
the minimal Smit score. 

Theorem 1. Under assumptions 1-5, GlobalMIT + applied 
to each variable in X finds a globally optimal d-th order 
DBN under the MIT score. 

Algorithm 1 GlobalMIT + : Optimal d th -order DBN with MIT 

Pa; := 0 

for p = 1 to n d 

If umit(p) > •SM/r(Pa;) then return Pa;; Stop. 
P = argmin {s M /r(Y)|Y C X d ; |Y| = p] 
If SM/r(P) < 5 M /r(Pa;) then Pa; := P. 
end for 

Proof. The key point here is that once a parent set grows 
to a certain extent, its complexity alone surpasses the total 
score of a previously found sub-optimal parent set. In fact, 
all the remaining potential parent sets P omitted by the 
algorithm have a total score higher than the current best 
score, i.e., s M it(Pa) > MM/r(|Pa|) > s M /r(Pa;), where Pa; 
is the last sub-optimal parent set found. □ 

We note that the terms 2N e ■ I s (X it X d ) in the S M IT 
score in (1) are all constant and would not affect the out- 
come of our optimization problem. Knowing their exact 
value is however, necessary for the stopping criterion 
in Algorithm 1, and also for determining its complexity 
bound, as will be shown in Section "Complexity analysis" 
Calculating 7 S (X;, X d ) is by itself a hard problem, requiring 
in general, a space and time complexity of order 0(k nd+l ). 
However, for our purpose, since the only requirement for 
Vmit is that it must be non-negative, it is sufficient to 
use an upper bound of I s (Xi,X d ). Since a fundamental 
property of the mutual information states that I(U, V) < 
mm{H(U),H(V)}, i.e., mutual information is bounded by 
the corresponding entropies, we have: 

2N e ■ I s (Xt,X d ) < 2N e ■ H s (Xd, 
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where H s (Xi) is the entropy of Xi estimated from a d- 
time-unit shifted expression profile, i.e., . . . ,xaq\. 

Otherwise, we can use a universally fixed upper bound for 
all H s (Xi), that is log k, then: 

2N e ■ I s (Xt,X d ) <2N e -logk. 

Using these bounds, we obtain the following more practi- 
cal versions of cImit- 

v^ /r (Pa ; ) = 2N e ■ H s (X t ) - 2N e ■ I S {X U Pa;) (5) 
v^ /T (Pa ; ) = 2N e • log k - 2N e ■ I s (X h Pa«). (6) 



It is straightforward to show that Algorithm 1 and 
Theorem 1 an 
place of vmit- 



Theorem 1 are still valid when v' MIT or v' MIT are used in 



Complexity analysis 

Theorem 2. GlobalMIT* admits a polynomial worst- 
case time complexity of 0((nd) loSkNe ) in the number of 
variables and DBN order. 

Proof. Our aim is to find a number p* satisfying 
umit(p*) > s mit($)- Clearly, there is no need to examine 
any parent set of cardinality p* and over. In the worst case, 
our algorithm will have to examine all the possible parent 
sets of cardinality from 1 to p* — 1. We have: 

p" 

UMITiP*) > SMirW) O Xa,l,ai(j) - v mit(V>) 

= 2N e ■ Is(Xi,X d ). 

As discussed above, since calculating vmit is not conve- 
nient, we use v' M]T and v" mT instead. With v' MIT , p* can be 
found as: 



p = argmin 



while for v" mr : 



p = argmin 



p\J2x a ,l ir r i (j)>2N e -H s (X i ) 
7=1 



P^XaMtd) ^ 2N e -logk 



It can be seen that p* depends only on a, A" and N e . Since 
there are 0((nd) p *) subsets of X d with at most p* par- 
ents, and each set of parents can be scored in polynomial 
time, GlobalMIT + admits an overall polynomial worst- 
case time complexity in the number of variables n and 
network order d. While p* does not admit a closed-form 
solution (since Xa,hj cannot be analytically calculated), a 
large over-estimate of p* can be provided as follows. Note 
that Xa,hj is the value such that P(X 2 (hj) < XaJij) = a. 



Since generally a ^> 0.5, if we take the mean value (cor- 
responding roughly to a = 0.5) of the X 2 (hj) distribution, 
i.e., lij, as an under-estimate for Xa,kj> then: 

P* />*-! 

> 2N e ■ logk 
0> (A- - 1) (k p * - l) > 2N e ■ logk o p* 

'2N e ■ logk 



k-l 



Assuming N e ^> logk, we can see that p* ~ log /( (7V e ), 
and the algorithm admits an overall complexity of 

0({nd) l °Zk N °). □ 

Let us now compare this bound with those of the 
algorithms for learning the globally optimal DBN under 
the BIC/MDL and BDe scoring metrics as proposed by 
[24], and implemented in the BNFinder software [21]. 
For BIC/MDL, p* MDL is given by [log^yYe], while for 
BDe, p* BDe = \N e log^-i k] , where the distribution P(G) oc 
X^I Pa 'l, with a penalty parameter 0 < k < 1, is used as 
a prior over the network structures [24, default value 
log A -1 = 1 for BNFinder]. In general, p* BDe scales lin- 
early with the number of effective data items N e , making 
its value less of practical interest, even for small data 
sets. Moreover, this bound becomes meaningless when 
N e > n, as pg De > n, meaning that in the worst case 
BNFinder +BDe will have to investigate all the possible 
parent sets. On the other hand, it can be seen that the first 
order GlobalMIT and BNFinder+MDL admits roughly the 
same worst-case computational complexity. 

The GlobalMIT* algorithm 

It is noted that the search space has been expanded from 
X[t - 1] in the case of the lst-order DBN, to X d = 
X[ t - 1] U . . . U X[ t - d] for the case of the J th -order 
DBN. Roughly, the number of variables has been multi- 
plied d times in order to accommodate the higher-order 
regulations. Such a multiplicative expansion in the search 
space may be very expensive, especially for a determin- 
istic global optimization algorithm such as GlobalMIT + . 
For very large networks, it may be useful to consider the 
following additional assumption: 

Assumption 6. (non-redundant, optimal-lag interaction) 
No multiple edges with different time lags exist between a 
parent Xi and its child Xj. Furthermore, the only one edge 
allowed, if it exists, must take place at the optimal lag <5*., 
where Sfj = argmax [l s (Xj,Xt[t — S] )|1 < S < d\. 
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This assumption restricts that for each node Xi, there 
may be only one single link to any node Xj at the most- 
probable time lag where their mutual information is max- 
imized. With this assumption in place, the search space 
for each variable Xj reduces from X rf = X[ t — 1] U . . . U 

X[ t - d] to X* = {x - S*j J } , which is equiva- 
lent in size to the first-order GlobalMIT algorithm. Thus 
Assumption 6 provides a trade-off between optimality and 
coverage: while the search is performed only on n variables 
at a pre-determined lag thereby significantly reducing the 
computational cost, this lag can take any value from 1 to 
d detecting delayed genetic interactions at the most likely 
time lag. We shall refer to this variant of GlobalMIT + , 
when Assumption 6 is employed, as GlobalMIT . It can be 
easily seen that, for any high order d > 1, GlobalMIT still 
admits the same complexity as the first order GlobalMIT. 

Results and discussion 

This section presents the experimental evaluation on 
GlobalMIT" 1 "^*. Our proposed algorithms are imple- 
mented within the Matlab/C++ GlobalMIT* toolbox, 
freely available as online supplementary material (Addi- 
tional file 1). We compare our approach with two other 
global optimization algorithms for learning DBN under 
the MDL and BDe metrics, namely BNFinder+MDL 
and BNFinder+BDe, which are part of the Python-based 
BNFinder software [21]. As elaborated in the previ- 
ous section, the BNFinder+BDe algorithm is generally 
very expensive, and hence not feasible for large or even 
medium (few tens of nodes) scale networks. In these cases, 
we replace BNFinder+BDe with BANJO [33], a Java-based 
software package for learning DBN using the BDe metric 
via a stochastic global optimization method, in particular 
simulated annealing. 

It is noted that the GlobalMIT + toolbox supports multi- 
threading to maximally exploit the currently popular 
multi-core PC systems. We conducted our experiments 
on a quad-core i7 desktop PC with 8Gb of main mem- 
ory, running Win7 64bit, which is a typical off-the-shelf 
PC configuration at the time this paper was written. Intel 
core i7 processors contain 4 separate cores, each can han- 
dle 2 independent threads concurrently. We shall execute 
GlobalMIT + with 6 threads in parallel (the remaining 
two being reserved for system and interface processes). 
BANJO also supports multi-threading, whereas BNFinder 
does not. While we could have run all algorithms with 
only a single thread, for a "fair" comparison in terms 
of run-time, our objective in carrying out the experi- 
ments this way is to highlight the capability and benefit 
of parallelization of GlobalMIT + . The 1-thread execu- 
tion time would be roughly three to five times longer in 
our observation. As for parameter setting, BNFinder was 
run with default settings, while BANJO was run with 6 



threads, simulated annealing+random move as the search 
engine, and its run-time was set to, either that required 
by GlobalMIT + or at least 10 minutes, whichever longer. 
GlobalMIT + has two parameters, namely the significance 
level a, to control the trade-off between goodness-of-fit 
and network complexity, and the DBN order d. Adjusting 
a will affect the sensitivity and precision of the discov- 
ered network, very much like its affect on the Type-I and 
Type-II error of the mutual information test of indepen- 
dence. De Campos [26] suggested using high significance 
levels, i.e., between 0.999 and 0.9999. We note that for 
smaller number of samples, a lower level of significance 
a may be necessary to avoid overly penalizing network 
complexity. Thus, in our experiments we set a = 0.999 
for N e < 100 and a = 0.9999 otherwise. The choice of 
a suitable DBN order d, on the other hand, is both 
species-specific and data-specific, in particular the data 
sampling rate. For example, in mammals, the transcrip- 
tional regulatory time delay can be from several minutes 
to several tens of minutes, and is composed of two com- 
ponents: the TF translation/post-translational process- 
ing/translocation time (~ 10.5 ± 4 mins), and the target 
gene transcription and post-transcription processing time 
(~ 20 — 40 mins) [28]. Also, for a higher data sampling 
rate, a higher d value is needed to cover the same time 
delay. It is also noted that increasing d will decrease the 
number of effective data points available for learning. In 
our experiments, we experimentally set d from 1 to several 
time units, depending upon the sampling rate. Whenever 
necessary, gene expression data were discretized using 
3-state quantile discretization. 

Small scale E. Coli network 

We study the E. coli SOS system [34] which involves lexA, 
recA and more than 30 other genes they directly regu- 
late. In normal condition, LexA binds to the promoter 
regions of these genes and acts as a master repressor. 
When the DNA is damaged, the RecA protein senses the 
damage and triggers LexA autocleavage. Drop in LexA 
level leads to de-repression of the SOS genes. When DNA 
repair completes, RecA stops mediating LexA autocleav- 
age, LexA accumulates and represses the SOS genes again. 
We used the expression data gathered in [34] for 8 genes, 
namely uvrD, lexA, umuD, recA, uvrA, uvrY, ruvA and 
polB, to reconstruct the interactions between these genes. 
The data set contains 4 time series, each of 50 observa- 
tions taken at 6-minute interval, under two UV exposition 
levels. Since the dynamics of each gene in all time series 
are similar, we can take the mean value of these time series 
as input to the algorithms. Thus, the input data consists of 
8 genes x 50 observations. 

For this small network, GlobalMIT + and BNFinder 
require only a few seconds, while BANJO was executed for 
10 minutes with 6 threads in parallel. The experimental 
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results are reported in Figure 3. GlobalMIT + (d = 1), 
BNFinder (BDe & MDL) all returned the same network 
in Figure 3(b), with ruvA being disconnected. Overall, 
this structure closely reflects the SOS network, in which 
the lexA/recA compound acts as a hub that controls the 
other genes. BANJO returned the network in Figure 3(c), 
in which the hub-structure is basically also identified, 
but with several more false interactions between the tar- 
get genes, e.g., between umuD and uvrD/uvrA. Note 
that the ruvA gene is also disconnected in the BANJO's 
recovered network. When testing with higher orders, 
GlobalMIT + discovered a similar hub structure. The most 
complete network was discovered at d = 6 in (Figure 3d), 
in which all the interactions between lexA/recA and other 
genes were recovered. Furthermore, the mutual interac- 
tion between lexA and recA were also correctly identified. 
Additional experiments to test the effect of data dis- 
cretization on this data set are presented in the online 
supplementary material (Additional File 2). 

Medium scale synthetic network for glucose homeostasis 

We study a glucose homeostasis network of 35 genes 
and 52 interactions, first proposed by Le et al. [35]. The 
network, which shows the genetic interactions that con- 
trol glucose metabolism in perinatal hypatocytes, was the 
result of an extensive literature review of the biological 



components affecting perinatal glucose metabolism. Le et 
al. [35] modeled the interactions using conditional prob- 
ability tables with two discrete states, with the strength 
of the interactions chosen to be consistent with biological 
variation. They provided a program to generate synthetic 
data sets from this network using a static Bayesian net- 
work model. It is clear from Figure 4 that the network 
has a cascade hierarchical structure, and is reasonably 
complex, with several genes being regulated by multi- 
ple transcription factors. In order to create a synthetic 
dynamic Bayesian network for testing, we modified Le 
et alls network as follows. First, we organized the nodes 
into 4 levels, with the top level comprising of the mas- 
ter transcription factors (TFs), and the interaction order 
between nodes in adjacent levels assumed to be one. The 
network in Figure 4 thus contains time-delayed interac- 
tions of orders 1 (13 edges), 2 (23 edges) and 3 (16 edges). 
Then, from the data generated by Le et alls program, 
we simply shifted forward the expression profiles of the 
2nd-, 3rd- and 4th-level nodes by 1, 2 and 3 time units 
respectively to create data for this DBN model. We gen- 
erated ten time series of 125 observations, then for each 
N e {25, 50, 75, 100, 125} we took the first N observations 
of these series for testing. Since the network structure 
in this experiment is known in advance by design, we 
can calculate the true positive (TP), false positive (FP) 
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Figure 4 The hepatic glucose homeostasis network: black, blue, red colors for 1 st-, 2nd- and 3rd-order interactions respectively. 



and false negative (FN) edges. The meanistandard devia- 
tion values for the performance metrics, namely sensitivity 
(=TP/(TP+FN)), precision (=TP/(TP+FP)) and runtime, 
over 10 time series for all algorithms are reported in 
Table 1. 

It is noted that we have omitted BNFinder+BDe in this 
experiment. The reason is that this algorithm becomes too 
expensive even for this medium network. For example, at 
N = 25, BNFinder+BDe requires around 1 minute. The 
execution time quickly increase to 1206±167 mins at N = 
50. And at N = 75, we could not even complete analyzing 
the first of the 10 datasets: the execution was abandoned 
after 3 days, with BNFinder+BDe having learnt the par- 
ents for only 2 nodes. Of the algorithms reported in 
Table 1, GlobalMIT, BANJO and BNFinder+MDL are lim- 
ited to learning the lst-order DBN It can be observed 
that GlobalMIT and BNFinder+MDL learned networks 
with similar sensitivity and precision, with both perfor- 
mance metrics improving as N increased. On the other 
hand, BANJO achieved a slightly better sensitivity, but at 
the cost of a significantly lower precision. This observa- 
tion is in concordance with our earlier experiment on the 
E. coli SOS network, in which BANJO also learned many 
more edges than GlobalMIT" 1 " and BNFinder. This result 
also highlights the major advantage of deterministic global 
optimization based approaches (GlobalMIT + , BNFinder) 
over stochastic global optimization based method such as 
BANJO. Wherever applicable, these methods never get 
stuck in local minima, and are able to deliver consis- 
tent and high quality results. Of course, BANJO on the 
other hand is the choice for very large datasets where 
deterministic methods are computationally infeasible. 

As for higher-order DBN learning algorithms, both 
GlobalMIT + and GlobalMIT* (with d = 3) achieves sig- 
nificantly better sensitivity compared to first-order DBN 
learning algorithms (GlobalMIT, BNFinder, BANJO). The 
improved sensitivity is mainly credited to the abil- 
ity of these algorithms to cover all the possible time- 
delayed interactions between the genes. More specifi- 
cally, at N = 125, GlobalMIT discovers on average 16.9 
high-order interactions, i.e., 43% of the total high-order 



interactions. Meanwhile, BANJO and BNFinder+MDL 
only recover on average 5.5 (14%) and 4.6 (12%) high- 
order interactions respectively. It is also noticeable from 
this experiment that GlobalMIT* delivered results almost 
identical to GlobalMIT + but with a much shorter time, 
comparable to the lst-order GlobalMIT. 

Large scale cyanobacterial genetic networks 

This section presents our analysis on a large scale 
cyanobacterial network. Cyanobacteria are the only 
prokaryotes that are capable of photosynthesis, and in 
recent years have received increasing interest [36], due to 
their high efficiency in carbon sequestration and poten- 
tial for biofuel production (up to 30 times more efficient 
than terrestrial oilseed crops). These organisms therefore 
are credited with holding the key to solving two of the 
most critical problems of our time, namely climate change 
and the dwindling fossil fuel reserves. Despite their evo- 
lutionary and environmental importance, the study of 
cyanobacteria using modern high throughput tools and 
computational techniques has somewhat lagged behind 
other model organisms. Herein, we focus on Cyanothece 
sp. 51 142, hereafter Cyanothece, a unicellular cyanobacte- 
rial strain that is involved not only in photosynthesis but 
also in nitrogen fixation in the same cell. As a byprod- 
uct of nitrogen fixation, Cyanothece has been recently 
shown to produce biohydrogen at very high rates that are 
several fold higher than previously described hydrogen- 
producing photosynthetic microbes [37]. 

We used transcriptomic data from [36], where samples 
from cells grown in alternating 12h light-dark cycles 
were collected every 4h over a 48h time course. We 
analyze the subset of 733 genes that have a 2-fold expres- 
sion in at least one of the 12 time points, as pub- 
lished in [36]. Since the sampling gap of 4h in this 
experiment is relatively large as compared to regular 
biological regulatory time lag, we used spline interpo- 
lation to interpolate two more data points in between 
each two actual measurements, i.e., upsampling the given 
time series at an lh20' interval. The resulting data 
set thus contains 733 genes x 34 time points. For this 



Table 1 Experimental results for the hepatic glucose homeostasis network 



GlobalMIT (d = 1 ) GlobalMIT* (d = 3) GlobalMIT + (d = 3) BANJO BNFinder+MDL 



N 


Pr 


Se 


Time 


Pr 


Se 


Time 


Pr 


Se 


Time 


Pr 


Se 


Time 


Pr 


Se 


Time 


25 


75 ± 17 


9±2 


0±0 


67 ±8 


18 ± 5 


0±0 


64 ± 12 


1 8 ± 5 


0±0 


12 ± 2 


22 ±3 


10 ± 0 


64 ± 17 


9±2 


0±0 


50 


82 ± 14 


19± 3 


0±0 


80 ± 10 


35 ±3 


0±0 


77 ± 12 


35 ±4 


0±0 


25 ±5 


27 ±6 


10 ± 0 


88 ± 12 


18 ± 4 


1 ±0 


75 


85 ± 12 


24 ± 3 


0±0 


85 ±6 


45 ±4 


0±0 


81 ±8 


46 ±4 


9±0 


34 ±4 


28 ±2 


10 ± 0 


85 ±11 


23 ±3 


7±0 


100 


94 ±7 


24 ± 2 


2±0 


98 ±4 


46 ±4 


0±0 


98 ±4 


46 ±4 


11 ±0 


41 ±5 


29 ±3 


11 ±0 


85 ±8 


25 ±3 


14 ± 0 


125 


91 ±8 


25 ±2 


2±0 


97 ±4 


50 ±3 


2±0 


97 ±4 


50 ±4 


482 ± 39 


43 ±4 


30 ±3 


482 ± 39 


82 ±8 


27 ±2 


20 ±0 



Se: percent sensitivity; Pr: percent precision; Time: in minutes. 
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large network, we employed the GlobalMIT version, 
with order d = 3 (which indeed covers one time point 
lag on the original data set). GlobalMIT inferred the 
network as in Figure 5(a) after 14.5 mins of execution 
time. Upon visualization with Cytoscape [38] using a 
standard layout algorithm, the network shows a clear 
scale-free topology, with the majority of nodes having 
only a few connections and a small number of hubs 
having many connections. The node degree in a scale-free 
network distributes according to a power-law distribu- 
tion, P(x) oc x~ y , with the scaling parameter y typi- 
cally between 2 and 3 for various networks in nature, 
society and technology. The scale-free property is thought 
to be a key organization feature of cellular networks, 
as supported by recent analysis on model organisms 
such as S. cerevisiae and C. elegans [39,40]. It is noted 
that some authors use the scale-free property as the 
prior input for their algorithms to, either encourage or 
enforce them to produce scale-free networks as output 



[40,41]. Herein however, we have not used any such prior 
information. 

To formalize this observation, we fit the node degree 
(counting both in- and out-degree) in the GlobalMIT 
inferred network to the power-law distribution using 
the method of maximum likelihood (ML). The ML esti- 
mate for y in this network is 2.24, falling well within 
the typical range. From Figure 6 it can be seen that 
the observed degree distribution fits well with the the- 
oretical P{x) = x~ 2 ' 2i curve. In order to verify that 
the scale-free structure is not merely an artefact of 
the inference algorithm, we test GlobalMIT with the 
same parameters on the same microarray data set, but 
with every gene expression profile randomly shuffled. 
The resulting network is shown in Figure 5(b). Using 
the same layout algorithm, no clear modular structure 
and hubs are visually recognizable for this network. 
Also, as clear from Figure 6, the node degree distri- 
bution largely deviates from a power-law curve, being 




(a) GlobalMlf (d=3) (b) GlobalMlf on random-data 




(c) BNFinder+MDL (d) BANJO 

Figure 5 The Cyanothece sp. 51 142 reconstructed genetic networks, visualized with Cytoscape. Node size is proportional to the node 
connectivity. 
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very short-tailed with the largest hubs having only 7 
connections. 

We next tested BNFinder and BANJO on this data set. 
BNFinder+BDE was abandoned after 3 days of execu- 
tion without finishing. BNFinder+MDL on the other hand 
is relatively fast, requiring only 4 mins. The resulting 
network, shown in Figure 5(c), also exhibits a scale-free 
structure. The ML estimate for y in this network is, inter- 
estingly, 2.25, very close to that of the GlobalMIT net- 
work. BANJO was run with 6 threads for lh. The resulting 

Table 2 Functional enrichment analysis for the top 20 hubs 



GlobalMIT" network 



Hub 


Degree 


Enriched function 


Corrected p-value 


cce_4432 


16 


Nitrogen fixation 


4.5 E-5 


cce_3394 


16 


Nitrogen fixation 


1.7E-5 


cce_3974 


14 


Photosynthesis, dark reaction 


1 4E-2 


cce_0997 


13 


Photosystem 1 


1.3 E-5 


cce_0103 


12 


Plasma membrane proton-transporting 


1.7E-5 


cce_0589 


11 


Signal transducer 


9.4E-3 


ccej 620 


10 


Photosystem II reaction center 


2E-2 


ccej 578 


10 


Structural constituent of ribosome 


1E-2 


cce_2038 


10 


Response to chemical stimulus 


4.5 E-2 


cce_4486 


9 


Photosynthetic membrane 


3.1 E-2 


BNFinder+MDL network 


cce_3394 


20 


Nitrogen fixation 


3.7E-8 


cce_3377 


17 


Proton-transporting ATPase activity 


2.1 E-7 


cce_3898 


15 


Structural constituent of ribosome 


2.5E-11 


cce J 943 


11 


peptidoglycan biosynthetic process 


3.4E-2 


cce_2639 


9 


thiamine-phosphate kinase activity 


2.1 E-2 


cce J 620 


8 


Photosystem II reaction center 


1E-2 


BANJO network 



network, shown in Figure 5(d), does not appear to possess 
a scale-free topology, and the node degree distribution 
also largely deviates from a power-law curve. In fact, the 
BANJO network node degree distribution resembles that 
of a random Erdos-Renyi graph with the same number of 
nodes and connections (Figure 6). 

We next perform functional enrichment analysis for the 
top hubs in each network. For this purpose, we gathered 
annotation data for Cyanothece sp. 51142 from Cyanobase 
[42, access May 2011]. Cyanobacteria in general and 



cce_4663 



10 



Calcium ion binding 



3.4E-2 
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Cyanothece in particular are not very well annotated. For 
example, to date, nearly half of Synechocystis sp. PCC 
6803's genes, the best studied cyanobacterium, remain 
unannotated. Therefore, we supplemented Cyanobase 
annotation with homology search using the Blast2GO 
software suit [43]. In total, these combined efforts gave us 
annotation data for 542 out of 733 genes in our study. We 
then employed BiNGO [44] for gene ontology functional 
category enrichment analysis, using the hypergeomet- 
ric test for functional over-representation, and False 
Discovery Rate (FDR) as the multiple hypothesis testing 
correction scheme. Only a corrected jj-value of less than 

0. 05 is considered significant. Following these procedures, 
of the top 20 hubs in the GlobalMIT network, 10 were 
found to be significantly enriched in major Cyanothece 
cellular processes, such as nitrogen fixation, photosyn- 
thesis and other closely related pathways, as presented 
in Table 2. Since the wet-lab experimental setting herein 
involves alternative light-dark cycles, this result is found 
to be highly biologically relevant. Cyanothece strains 
thrive in marine environments, and in addition to car- 
bon fixation through photosynthesis, these bacteria can 
also perform nitrogen fixation by reducing atmospheric 
dinitrogen to ammonia. Since the nitrogenase enzyme is 
highly sensitive to oxygen, Cyanothece temporally sepa- 
rates these processes within the same cell, so that oxygenic 
photosynthesis occurs during the day and nitrogen fix- 
ation during the night [36]. Thus, under normal growth 
condition with regular dark-light cycles and without 
any stress, it could be expected that photosynthesis and 
nitrogen fixation are the two most active Cyanothece cel- 
lular processes. This is reflected clearly in the GlobalMIT 
reconstructed network. Upon inspecting BNFinder+MDL 
network, 6 out of the top 20 hubs were found to be signif- 
icantly enriched, also in major relevant cellular processes. 
It is noted that while GlobalMIT show the most hubs, 
BNFinder+MDL manages to recover several hubs with 
significantly better corrected p-value. In particular, 3 hubs 
for nitrogen fixation, proton transport and ribosome were 
recovered with significantly smaller corrected j?-value. 
However, as opposed to GlobalMIT , other important 
functional hubs for photosynthesis, photosystem I & 
II were missing. BANJO on the other hand produced 
relatively poor result, with only 1 out of 20 top hubs 
turned out to be significantly enriched, but not related 
to any major cellular pathway. The overall results suggest 
that both GlobalMIT* and BNFinder+MDL successfully 
reconstructed biologically plausible network structures, 

1. e., scale-free with a reasonable scaling parameter value, 
and with functionally enriched modules relevant to the 
wet-lab experimental condition under study. GlobalMIT" 
managed to produce more enriched hubs, as a result of 
the higher order DBN model employed and the improved 
MIT scoring metric. BANJO on the other hand, generally 



failed to produce a plausible network structure. This 
experimental result thus highlights the advantage of 
deterministic global optimization approach, as employed 
by GlobalMIT and BNFinder+MDL, versus a stochastic 
global optimization approach as employed by BANJO. 

Conclusion 

In this paper, we have introduced GlobalMIT" 1 " and 
GlobalMIT*, two DBN-based algorithms for reconstruct- 
ing gene regulatory networks. The GlobalMIT suite makes 
use of the recently introduced MIT scoring metric, which 
is built upon solid principles of information theory, having 
competitive performance compared against the other tra- 
ditional scoring metrics such as BIC/MDL and BDe. In 
this work, we have further shown that MIT possesses 
another very useful characteristic in that when placed 
into a deterministic global optimization framework, its 
complexity is very reasonable. As theoretically shown and 
experimentally verified, GlobalMIT exhibits a much lower 
complexity compared to the BDe-based algorithm, i.e., 
BNFinder+BDe, and is comparable with the MDL-based 
algorithm, i.e., BNFinder+MDL. GlobalMIT + /* are also 
designed to learn high-order variable time delayed genetic 
interactions that are common to biological systems. Fur- 
thermore, the GlobalMIT* variant has the capability of 
reconstructing relatively large-scale networks. As shown 
in our experiments, GlobalMIT" 1 "/* are able to reconstruct 
genetic networks with biologically plausible structure and 
enriched submodules significantly better than the alter- 
native DBN-based approaches. Our current and future 
study of GlobalMIT" 1 "/* mainly focuses on the applica- 
tion of these newly developed algorithms to elucidate 
the gene regulatory network of Cyanothece, Synechocys- 
tis, Synechococcus amongst other cyanobacteria strains 
having high potential for biofuel production and carbon 
sequestration. 
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